seq1 <- ">random sequence 1 consisting of 20 residues.
KMMIDIHWGMWWYEYMMCLD"
seq2 <- ">random sequence 1 consisting of 20 residues.
DVYRVCQNVFRYHHFCKRTI"
out <- statSignf(seq1, seq2)The names of the sequences are 'random sequence 1 consisting of 20 residues' and
'random sequence 1 consisting of 20 residues'.
These sequences are of type 'Protein'.
The type of alignment that has been done here is 'global'.
The substitution matrix used is 'BLOSUM62'.
The gap scores are -3 for open penalty and -1 for extended penalty.
The number of shuffles done are 1000.
The shuffling was done on the first sequence mentioned above.
The original score is 1.
The estimations of the parameters of the Gumbel distribution are:
Scale parameter lambda = 0.2393786 and mode u = -1.859929.
The p-value estimated using the estimated Gumbel distribution is 0.3960651.
The p-value estimated empirically by counting in the histogram is 0.411.
The estimated K is 0.001601696.
The standardized score is S' = 0.684606.
Summary of the 1000 scores calculated after shuffling:
Min. 1st Qu. Median Mean 3rd Qu. Max
-15.000 -3.000 0.000 0.551 4.000 20.000
out()seq1 <- ">random sequence 1 consisting of 20 residues.
KMMIDIHWGMWWYEYMMCLD"
seq2 <- ">random sequence 1 consisting of 20 residues.
DVYRVCQNVFRYHHFCKRTI"
out <- statSignf(seq1, seq2, num.shuffles = 3000,
subst.matrix = "BLOSUM45", kind.align = "local")The names of the sequences are 'random sequence 1 consisting of 20 residues' and
'random sequence 1 consisting of 20 residues'.
These sequences are of type 'Protein'.
The type of alignment that has been done here is 'local'.
The substitution matrix used is 'BLOSUM45'.
The gap scores are -3 for open penalty and -1 for extended penalty.
The number of shuffles done are 3000.
The shuffling was done on the first sequence mentioned above.
The original score is 22.
The estimations of the parameters of the Gumbel distribution are:
Scale parameter lambda = 0.287885 and mode u = 20.82963.
The p-value estimated using the estimated Gumbel distribution is 0.5102972.
The p-value estimated empirically by counting in the histogram is 0.4846667.
The estimated K is 1.005085.
The standardized score is S' = 0.336933.
Summary of the 3000 scores calculated after shuffling:
Min. 1st Qu. Median Mean 3rd Qu. Max
13.00000 20.00000 22.00000 22.83433 26.00000 42.00000
out()seq1 <- ">non-random sequence 1 very similar to 2.
KKKKKKKKKKRRRRNNRRLLLMMMNM"
seq2 <- ">non-random sequence 2 very similar to 1.
KKKKKKKKKKKRRRRRRLLLLMMMNN"
out <- statSignf(seq1, seq2, num.shuffles = 2000,
subst.matrix = "BLOSUM100", shuffle.seq = 2, gap.scores = c(8, 2))The names of the sequences are 'non-random sequence 1 very similar to 2' and
'non-random sequence 2 very similar to 1'.
These sequences are of type 'Protein'.
The type of alignment that has been done here is 'global'.
The substitution matrix used is 'BLOSUM100'.
The gap scores are -8 for open penalty and -2 for extended penalty.
The number of shuffles done are 2000.
The shuffling was done on the second sequence mentioned above.
The original score is 199.
The estimations of the parameters of the Gumbel distribution are:
Scale parameter lambda = 0.06495775 and mode u = 51.15488.
The p-value estimated using the estimated Gumbel distribution is 6.747724e-05.
The p-value estimated empirically by counting in the histogram is 0.
The estimated K is 0.04103675.
The standardized score is S' = 9.603686.
Summary of the 2000 scores calculated after shuffling:
Min. 1st Qu. Median Mean 3rd Qu. Max
7.0000 46.0000 59.0000 60.0395 73.0000 139.0000
out()seq1 <- ">random sequence 1 consisting of 500 residues.
YWWCSKEWDHFPDVTTCTPSEQYPAWAMHNMILPNWQLAEMMQSYRYIPNIALPNHQLTK
AHSMWAQHSMYVQRCCETKDLLFNDDTSQDKAYPFKQMESNHHITNEFPNKCKILTSVKH
ATLWNLQALGCCKDPCRSNKFHKKLNIDIHCSPGWTGWNAQYSSPGPFCWEPKTVNYSWF
RFKWHYFPCVHNIGSSRQCVWLRYFHLSSEQWKQGARGWVVVIFGACSGWYPWDNGQLYQ
EKNIFCAGKGQCATDQYFNYLWSWMISAGWAVYPYDECVTRTVISLFEVAYYFRHPYMWH
NITIMLRNETLPAVTQCVLETLHTHYCLALWLEEMYPCTDEGYNRKTPGDTHVQDCAFFS
ESHKHDVKTNWVGSDSINYNPGSVMWKICDHLGAGMYGRPTPADWSQSIITNHICCGDAS
HCGEQWCAVNNDSVSTMISQFQTSEWAHPIVINQHGIEPDMSWGELARVLTQNPGLGQQN
TIRMKTFYRKFFPCMYFNFQ"seq2 <- ">random sequence 1 consisting of 500 residues.
FWMLVNRLCQDTGWEYADCRPDHGNESRMMYKDCFYHITDPAEATVFPIESFCQHLCSNF
WSNTHWRIINYPPLHWKNYCGSAFWGRNYGEWSCFEDRMPFATEIETHSNPVFNDFQEAQ
CNNQARKKNGWKVSAPQEAMPQGMMRLTWIFIHEMWGWFSWVWRLIQNQINEPGVKPLEL
CSETQHGVFGWRDIAIMIRMEYKCDVLFMWLILCPCYYSDQFCKRIAAHQEAMRWKIAKT
AWQFVGKIKGPKCRMLTERRQFACEVTEQACKRCLPHAVRHTGQSHKYHCTAFRPFTKIS
VAYGVEIGESFFFQWYWQFRFSFATAWAISNWGQGWSLCIEEVNWNDIKEHFTFTTKIML
VFTENFTEHSQEIFDSLMDHGPDHKVSIIARQMVACQIWITHKMGCCHDGLLYTPTHDWF
LQVRVPCFFQWVEGWNGIEDDIPGHVRMNSMTSWLNKPLASRILLDIYIMTWDNRWFKWD
KTWMVVGHNIHDAWSFENVK"out()seq1 <- ">random sequence 1 consisting of 800 bases.
ctattggccgttggaggggaagtgtatcgactcgtcaagcgctctaatgagaaaccggct
tactcgccctcttatgatccttgacaagcgtctgactccgcaaggatcccgggctgaggg
ctacacccacctcgtttttagttttttaacagcagaaatcagcccgacgaagaacgagga
aggttctattaagagccgacggatccgaagaacacgtggtccctgacccttagaaccgat
tcgcgggctaaaaagcaacacgcttcgacccgatctgtcgcattgaattgaaacgcccct
ttattgtagctactaaccaggcttccttatagctattagcagggctggaagtgcggattc
actctaagacggtcacagctcatcagctcccctgcctgtagctgaacgtaccaatatctg
caagtaacaaccttgagtaccgtaagactgtaaaaatcttcgattccgcattatagggca
aggctaacgcatttccgccaaaggagaatgaggttgttgcattcccgtgagtgaaaggtc
ttagaaaacaagttcgatgcgcaggaggttcagattctgttccccccgtcgcgccgcgcc
tcaaaacccggtaccttcgcaaggagcccccagactctatggctaatcacttgtttgcaa
gaggctctgatccaaatgacatcaacctaacgcacaaagctgaccgaatataaacgattt
tctagcatatcgcgttttaattcgctagctgcagctttcgcatccgtcgccgctttaggc
ggggcgaacatcctgacccg"seq2 <- ">random sequence 1 consisting of 800 bases.
gagtgttttcattagcatccgcttagcatgggggtatgctgtactagtagacgtcagaga
taaggactggtcataattcaaggcgccgccatactaatgttagccttgttaatgactaat
aaatcctccccggaaatccataacaacgaagcctcgcgattacccgcaacgagaaaaact
caataaagggaaggcagacaaatgtatgcagatagtgcatgttctgcgaacactccacgt
atttgaacatataatcctaaaggtgtgttacccgagacactccgtccactctcgtcggta
acataagggtcgtcttttagtaacttccagcggtcatactaacaggctcatcgcactggt
agattctgtgcgggtaaacgcattatagacttgtgtcgagcccccggtgctggcgaattg
tccgcaggggcgacaggagtcaaaacgtgcttgaacaagaccgaacctatgtccttctga
gacgggatctccaatcgccccactgcggagcccctggcaccgcttatgccctagcagtca
ctgtacaaaacctatagccgtgccaccactcagcccgtctacggctgcaccgctactcaa
gtcccgagactcacgctgctcgaactgccttactctggtcaatagagcttaggaacctgt
gtaacggcatgcgttctgtgttagttaaatcgtgacatggcgcccaagcacagtcataat
tattgaaagttatctacatgggatgatgagcactgagggacgcaccgcgcctttatccat
ctatgccgcggcccgttcca"out()seq1 <- ">random sequence 1 consisting of 20 residues.
KMMIDIHWGMWWYEYMMCLD"
seq2 <- ">random sequence 1 consisting of 20 residues.
DVYRVCQNVFRYHHFCKRTI"
out <- statSignf(seq1, seq2,
plot = F, suppress.output = T)# out() gives an error, since 'out' is no longer a function.
outNULL
---
title: "Statistical Significance of a Pairwise Alignment of Sequences"
author: "Alex-Alex-Helena"
date: "10.11.2021"
output:
flexdashboard::flex_dashboard:
orientation: rows
social: menu
source_code: embed
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
library(StatSignfPairSeqAlign)
```
Simple Example
====================================================================================================================================
### Output {data-width=450}
```{r}
seq1 <- ">random sequence 1 consisting of 20 residues.
KMMIDIHWGMWWYEYMMCLD"
seq2 <- ">random sequence 1 consisting of 20 residues.
DVYRVCQNVFRYHHFCKRTI"
out <- statSignf(seq1, seq2)
```
### Plot {data-width=550}
```{r}
out()
```
Local Alignment
====================================================================================================================================
### Output {data-width=450}
```{r}
seq1 <- ">random sequence 1 consisting of 20 residues.
KMMIDIHWGMWWYEYMMCLD"
seq2 <- ">random sequence 1 consisting of 20 residues.
DVYRVCQNVFRYHHFCKRTI"
out <- statSignf(seq1, seq2, num.shuffles = 3000,
subst.matrix = "BLOSUM45", kind.align = "local")
```
### Plot {data-width=550}
```{r}
out()
```
Non-random Sequences
====================================================================================================================================
### Output {data-width=450}
```{r}
seq1 <- ">non-random sequence 1 very similar to 2.
KKKKKKKKKKRRRRNNRRLLLMMMNM"
seq2 <- ">non-random sequence 2 very similar to 1.
KKKKKKKKKKKRRRRRRLLLLMMMNN"
out <- statSignf(seq1, seq2, num.shuffles = 2000,
subst.matrix = "BLOSUM100", shuffle.seq = 2, gap.scores = c(8, 2))
```
### Plot {data-width=550}
```{r}
out()
```
Longer Sequences
====================================================================================================================================
Column {.sidebar data-width=700}
-------------------------------------
### Output
```{r}
out <- statSignf(seq1, seq2,
subst.matrix = "PAM250", kind.align = "local")
```
Row {data-height=310}
-------------------------------------
### Sequence 1
```{r}
seq1 <- ">random sequence 1 consisting of 500 residues.
YWWCSKEWDHFPDVTTCTPSEQYPAWAMHNMILPNWQLAEMMQSYRYIPNIALPNHQLTK
AHSMWAQHSMYVQRCCETKDLLFNDDTSQDKAYPFKQMESNHHITNEFPNKCKILTSVKH
ATLWNLQALGCCKDPCRSNKFHKKLNIDIHCSPGWTGWNAQYSSPGPFCWEPKTVNYSWF
RFKWHYFPCVHNIGSSRQCVWLRYFHLSSEQWKQGARGWVVVIFGACSGWYPWDNGQLYQ
EKNIFCAGKGQCATDQYFNYLWSWMISAGWAVYPYDECVTRTVISLFEVAYYFRHPYMWH
NITIMLRNETLPAVTQCVLETLHTHYCLALWLEEMYPCTDEGYNRKTPGDTHVQDCAFFS
ESHKHDVKTNWVGSDSINYNPGSVMWKICDHLGAGMYGRPTPADWSQSIITNHICCGDAS
HCGEQWCAVNNDSVSTMISQFQTSEWAHPIVINQHGIEPDMSWGELARVLTQNPGLGQQN
TIRMKTFYRKFFPCMYFNFQ"
```
### Sequence 2
```{r}
seq2 <- ">random sequence 1 consisting of 500 residues.
FWMLVNRLCQDTGWEYADCRPDHGNESRMMYKDCFYHITDPAEATVFPIESFCQHLCSNF
WSNTHWRIINYPPLHWKNYCGSAFWGRNYGEWSCFEDRMPFATEIETHSNPVFNDFQEAQ
CNNQARKKNGWKVSAPQEAMPQGMMRLTWIFIHEMWGWFSWVWRLIQNQINEPGVKPLEL
CSETQHGVFGWRDIAIMIRMEYKCDVLFMWLILCPCYYSDQFCKRIAAHQEAMRWKIAKT
AWQFVGKIKGPKCRMLTERRQFACEVTEQACKRCLPHAVRHTGQSHKYHCTAFRPFTKIS
VAYGVEIGESFFFQWYWQFRFSFATAWAISNWGQGWSLCIEEVNWNDIKEHFTFTTKIML
VFTENFTEHSQEIFDSLMDHGPDHKVSIIARQMVACQIWITHKMGCCHDGLLYTPTHDWF
LQVRVPCFFQWVEGWNGIEDDIPGHVRMNSMTSWLNKPLASRILLDIYIMTWDNRWFKWD
KTWMVVGHNIHDAWSFENVK"
```
Row {data-height=690}
-------------------------------------
### Plot
```{r}
out()
```
DNA
====================================================================================================================================
Column {.sidebar data-width=650}
-------------------------------------
### Output
```{r}
out <- statSignf(seq1, seq2, kind.seq = "DNA", gap.scores = c(8, 2),
num.shuffles = 200)
```
Row {data-height=450}
-------------------------------------
### Sequence 1
```{r}
seq1 <- ">random sequence 1 consisting of 800 bases.
ctattggccgttggaggggaagtgtatcgactcgtcaagcgctctaatgagaaaccggct
tactcgccctcttatgatccttgacaagcgtctgactccgcaaggatcccgggctgaggg
ctacacccacctcgtttttagttttttaacagcagaaatcagcccgacgaagaacgagga
aggttctattaagagccgacggatccgaagaacacgtggtccctgacccttagaaccgat
tcgcgggctaaaaagcaacacgcttcgacccgatctgtcgcattgaattgaaacgcccct
ttattgtagctactaaccaggcttccttatagctattagcagggctggaagtgcggattc
actctaagacggtcacagctcatcagctcccctgcctgtagctgaacgtaccaatatctg
caagtaacaaccttgagtaccgtaagactgtaaaaatcttcgattccgcattatagggca
aggctaacgcatttccgccaaaggagaatgaggttgttgcattcccgtgagtgaaaggtc
ttagaaaacaagttcgatgcgcaggaggttcagattctgttccccccgtcgcgccgcgcc
tcaaaacccggtaccttcgcaaggagcccccagactctatggctaatcacttgtttgcaa
gaggctctgatccaaatgacatcaacctaacgcacaaagctgaccgaatataaacgattt
tctagcatatcgcgttttaattcgctagctgcagctttcgcatccgtcgccgctttaggc
ggggcgaacatcctgacccg"
```
### Sequence 2
```{r}
seq2 <- ">random sequence 1 consisting of 800 bases.
gagtgttttcattagcatccgcttagcatgggggtatgctgtactagtagacgtcagaga
taaggactggtcataattcaaggcgccgccatactaatgttagccttgttaatgactaat
aaatcctccccggaaatccataacaacgaagcctcgcgattacccgcaacgagaaaaact
caataaagggaaggcagacaaatgtatgcagatagtgcatgttctgcgaacactccacgt
atttgaacatataatcctaaaggtgtgttacccgagacactccgtccactctcgtcggta
acataagggtcgtcttttagtaacttccagcggtcatactaacaggctcatcgcactggt
agattctgtgcgggtaaacgcattatagacttgtgtcgagcccccggtgctggcgaattg
tccgcaggggcgacaggagtcaaaacgtgcttgaacaagaccgaacctatgtccttctga
gacgggatctccaatcgccccactgcggagcccctggcaccgcttatgccctagcagtca
ctgtacaaaacctatagccgtgccaccactcagcccgtctacggctgcaccgctactcaa
gtcccgagactcacgctgctcgaactgccttactctggtcaatagagcttaggaacctgt
gtaacggcatgcgttctgtgttagttaaatcgtgacatggcgcccaagcacagtcataat
tattgaaagttatctacatgggatgatgagcactgagggacgcaccgcgcctttatccat
ctatgccgcggcccgttcca"
```
Row {data-height=550}
-------------------------------------
### Plot
```{r}
out()
```
Suppress Output and Plotting
====================================================================================================================================
### Output {data-width=500}
```{r}
seq1 <- ">random sequence 1 consisting of 20 residues.
KMMIDIHWGMWWYEYMMCLD"
seq2 <- ">random sequence 1 consisting of 20 residues.
DVYRVCQNVFRYHHFCKRTI"
out <- statSignf(seq1, seq2,
plot = F, suppress.output = T)
```
### Plot {data-width=500}
```{r}
# out() gives an error, since 'out' is no longer a function.
out
```